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A bias correction to Akaike's information criterion (AIC) is derived for seemingly unrelated re- 
gressions models. The correction is of particular use when the sample size is not much larger than 
the number of fitted parameters. A small-sample simulation study indicates that the bias-corrected 
AIC (AICc) provides better model choices than other model selection criteria. 



0\ • I. INTRODUCTION 

o ■ 
o ; 

^ The selection of a model from a set of fitted candidate models requires objective data-driven criteria. One such 
criterion often used in practice is Akaike's information criterion (AIC), which was designed to be an asymptotically 
unbiased estimator of the expected KuUback-Leibler information of a fitted model ^l]. In finite samples, AIC has a 
non-vanishing bias that depends on the number of fitted parameters. This limits its effectiveness as a model selection 
criterion, particularly in instances where the sample size is not much larger than the number of fitted parameters of 
the most complex candidate model. For such instances, Hurvich and Tsai extended the bias-corrected AIC (AICc) 



originally suggested by Sugiura [S] for linear regression models, to non-linear regression models and autoregressive 
models. Also, Hurvich and Tsai [2| demonstrated the small-sample superiority of AICc over AIC as a model selection 
I criterion. Since then, AICc has been extended to many other models, such as autoregressive moving average models 
01 vector autoregressive models @ and multivariate linear regression models Q . 

The objective of this work is to define AICc for seemingly unrelated regressions models. These are models of 
multiple response variables that follow a joint distribution [3, [^. In contrast to the multivariate linear regression 
model of Ref. the response variables of a seemingly unrelated regressions model do not need to depend on the 
same covariates. Seemingly unrelated regressions models play a central role in econometrics [9] but also appear in 
<^ ; other contexts O, [HI, Ef • 

■ The remainder of this paper is organized as follows. In Sec. [ill the bias of AIC is calculated in seemingly unrelated 
I regressions models with the assumption that the candidate model is either correctly specified or overspecified. The 

f — I same assumption is required for AIC to be asymptotically unbiased [l5 | and has been used to calculate its bias in 
<^ '■ finite samples in other models 0, H, [6|. Expanded in inverse powers of the sample size N, the bias of AIC (Baic) 
\^ , takes the form Baic — —N~^(3{T,o) + o{N~^), where the positive coefficient /3(So) = 0(1) depends on the unknown 

■ true p X p covariance matrix Eq of the p response variables. In Sec. [Tll a lower bound /3* > of mino /3(fi), where 
^\ the minimization is over all p x p symmetric positive definite matrices i7, is found in terms of the number of fitted 

■ parameters and AICc is defined as AICc = AIC -I- N'-^f]*. The performance of AICc as a model selection criterion 
^ I is simulated in Sec. IIIII and compared to that of AIC and the Bayesian information criterion (BIC) of Schwarz p^ . 

•rH Finally, we give some concluding remarks in Sec. IIVI Details about the calculation of /3(So)i its lower bound (3* and 
the simulation study are given in, respectively, Appendices |3 |B] and O Appendix [U] also holds additional simulation 
H , results. 



II. AIC AND AICc 



We consider the seemingly unrelated regressions model 

Y = ZB + U. (1) 

Here, y is an iV x p matrix of p response variables on N subjects, Z is a known N x M matrix of N values of M 
covariates, each row of the N x p matrix U has independent Np(0, S) distribution, and B is an M x p matrix holding 
K < Mp regression coefficients and {Mp — K) zeroes. The restriction Bij = means that response variable yj of the 
j-th column of Y does not depend on covariate Zi of the i-th column of Z . The entries of the elements of the j-th 
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column of B that are not restricted to zero, are collected in the set JT, . Each column of the matrix B holds at least 
one regression coefficient, which means that J^j is non-empty for all j. Throughout this work, we assume that the 
M X M matrix Z is positive definite, that p and M do not scale with N, and that limjv^oo N~-^Z^ Z is finite and 
positive definite. 

Suppose that Y is not generated by the model of Eq. ([T]), but by the model 

Y = ZoBo + £. (2) 

Here, Zq is an A'^ x Mq matrix of N values of A/q unknown true covariates, -Bq is an Mq x p matrix of unknown 
coefficients and each row of the N x p matrix £ has independent -/Vp(0,I]o) distribution with unknown covariance 
matrix Sq. The entries of the non- vanishing elements of the j-th column of Bq are collected in the set Joj • A measure 
of the discrepancy between the candidate (or approximating) model of Eq. ([T]) and the data-generating model of Eq. 
([2|) is the KuUback-Leibler information 

A(B,S) = Eo{-2C{B,J:)} = Np\n27r + NlnDetT. + Tr{ZoBo - ZB)^{ZoBo - iVTrEoI]"\ (3) 

where Eq denotes expectation under the data-generating model and C{B, S) is the log-likelihood function of the 
candidate model, 

-2C{B,J:) Np\n2Tr + N\nDetY, + Tr{Y - ZBf{Y - ZB)Y,-\ (4) 

AIC is an estimator of the expected KuUback-Leibler information Eq{A{B,I])}, where B and E are the maximum 
likelihood estimators of, respectively, B and E. It is defined as the sum of —2C{B, E) and twice the number of fitted 
parameters, 

AIC(E) ^ N InDett, + Np{\n2n + 1) + 2K + p{p + 1). (5) 

In Appendix [A} with the assumption that the candidate model is either correctly specified or overspecified {Zq = Z 
and Joi C Ji for alH), we demonstrate that 

Saic = ^o{AIC(E)} - Eo{A{B, E)} = -A^-i/3(Eo) + o{N-^), (6) 

where /3(Eo) = 0{1) takes the form 

/3(Eo) - 6K{p + 1) + 2Tr(TrsPo)' - STrPoPg^^ - 3TT{TvnPo){TTnPj) + p{p + if. (7) 

Here, the Np x Np oblique projection matrix Pq is given by 

Po = X{xT(Eo-i ® 1n)X}-'X^{^^^ ^ In), (8) 

where X is an Np x K block-diagonal matrix of p blocks of x \Ji\ matrices Xi holding the \J'i\ columns of Z 
corresponding to Zj with j G J'i, 



X = 



/ Xi 

••• 1- (9) 

\ Xp 



In Eq. ([7]), the operators 'Trg' and 'Tr^' denote partial traces over, respectively, the N subjects and the p response 
variables. Given an Np x Np matrix A, TigA is the pxp matrix defined componentwisely as (Trs^)y = X]^=i ^injn: 
where Ain_jm is multi-index notation for v4(j_i)jv+n,(j-i)Af-i-m- Similarly, TrftA is the N x N matrix with elements 
{TruA)nm = Ain^im- Finally, in Eq. ([7]), 'Ts' denotes the partial transpose of subjects: {A'^^) 

171, jm — -^im,jn- 

If Ji = Jj for all i and j, /3(Eo) collapses to 

/?* = 3^^(p + l) + 2/^V^+p(p + l)^ (10) 

which equals the coefficient of the first term of the expansion of — Saic of Ref. ^] in inverse powers of A^. In Appendix 
IbI we demonstrate that 

/3* < niin/3(fi), (11) 
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where the minimization is over all p x p symmetric positive definite matrices f2 and the equality sign is attained if 
and only if Ji — Jj for all i and j. We define AICc as 

AICc(E) = AIC(E) + iV-i/?*. (12) 

Because < /3* < mino I3{Vl), 

Saicc = ^o{AICc(S)} - ^o{A(^, S)} = -iV-i {/3(So) - P*} + o{N-^) (13) 

satisfies 

lim iV^Aic < lim iV^AiCc < 0. (14) 

III. A SIMULATION STUDY 

We compare the performance of AIC, AICc and BIG in the selection of seemingly unrelated regressions models. 
For this purpose, 1000 samples of sizes iV = 15, = 20 and A^ = 50 are created from the data-generating model ([2|) 
with p — 2. For each sample and each criterion, the fitted candidate model with the smallest value of the criterion 
is selected from a set of candidate models. The matrix Z holds the values of 10 covariates and its lOA^ elements are 
fixed after drawing them independently from A''(0, 1). We consider 25 candidate models specified by J7i = {1, . . . , i} 
and — {6, . . . , 5 + j}, where i and j are integers ranging from 1 to 5. For the data-generating model, we set Zq — Z 
and take Joi = {1, 2} and J02 = {6, 7}. The 4 non-vanishing elements of Bq equal unity and the covariance matrix 
Eg has parametrization Eg = {1 — p)tp + pjp, where jp is the p x p matrix of ones and \p\ < 1. The samples are 
constructed based on 1000 independent drawings of £, where each row of £ is independently drawn from Np{0, Eq). 

The candidate models are fitted with the constrained maximization (CM) algorithm [l5l.[l6t: 

±n+i ^ N-^{Y - Zi3nf{Y - ZI3n), wherc vec{ZBn) ^ X{X^{±-^ (g) lN)Xy^X^{±-^ (g, lN)yec{Y). (15) 

Here, E„+i and i?„ are estimators of, respectively, E and i?, n is a positive integer and 'vec' is the column-wise 
vectorization operator. The algorithm is started with Ei = Ip and terminated if |DctE„_|_i — DctE„| < (5DctE„, with 
6 = 1- 10^^. If the log-likelihood function C{B, E) is globally concave, then E„+i and Bn converge to, respectively, 
E and B and the numerical error of InDetE is of the order of magnitude of S. If C{B,'£) is multi-modal, the CM 
algorithm does not necessarily converge to the global maximum, but may end up in a local maximum or a saddle 
point [l3i[3- Although multi-modality is rare, we choose several other initial estimators Ei and calculate InDetE 
with a numerical error of about 106 (see Appendix ICl for details). This means that the difference between two values 
of a criterion has a numerical error of 20N6. 

The frequencies of selecting the 25 candidate models with the three criteria are given in Table [T] for p = 0.5 and 
A^ — 15. The correct model {i — j — 2) is more often selected with AICc than with AIC and BIC. To see how 
the improvement of AICc on AIC is related to the bias correction, we have plotted i?o{A(E, _B)}, £'o{AICc(E)} and 
£'o{AIC(E)} as a function of i (with j = 2) in Fig. [TJ The expected KuUback-Leibler information has a minimum 
at i = 2 and increases rapidly with i for i > 2. This increase is more precisely followed by £'o{AICc(E)} than by 
£'o{AIC(E)}, which explains why AIC more often selects models that are too complex. In Appendix[Cl the frequencies 
of selecting the correct model with the three criteria are given for p — 0.2, 0.5, 0.8 and A^ = 15, 20, 50. The frequencies 
do not depend much on p and, as expected, the improvement of AICc on AIC decreases as A^ increases. For A^ — 20, 
AICc and BIC perform equally well, while for A^ = 50, the asymptotically consistent BIC outperforms AICc. In 
Appendix [Cl we also demonstrate that 6 is sufficiently small and that the results of Table |T] are not affected by 
numerical errors. 
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TABLE I: Frequencies of selecting the 25 candidate models with AlC, AlCc and BIG in 1000 samples of size A'' = 15 for p = 0.5. 

AIC AlCc BIG 

1 j j j 

12345 1234512345 

101 00 02 01002 000 

2 3 241 75 60 66 9 488 83 50 40 7 385 78 53 58 

3 1 53 20 16 34 5 64 14 14 12 4 59 15 18 23 

4 1 49 25 36 52 3 49 12 18 21 4 47 19 26 30 

5 4 63 31 46 123 4 37 12 17 45 1 50 20 29 72 




12 3 4 

i 



FIG. 1: Expected Kullback-Leibler information (triangles), AIGc (squares) and AIG (circles) as a function of i with j = 2 for 
A = 15 and p = 0.5. The expected criteria are estimated with the same 1000 samples as the ones of Table H] The standard 
error of the expected AIG (and AIGc) is about 0.3 for all i and that of the expected Kullback-Leibler information ranges from 
0.3 (i = 1) to 1.8 (i = 5). 

IV. DISCUSSION 

In the simulation study of Sec. IIIIl the data-generating model is finite dimensional and one of the candidate models 
is correctly specified. The case of an infinite dimensional data-generating model is not considered here. Although in 
this case the assumption of correct specification or overspecification does not hold for any candidate model, Hurvich 
and Tsai [l^ demonstrated that for linear regression models in small samples, AICc is much less biased than AIC for 
most choices of the data-generating model. A similar study can be done for seemingly unrelated regressions models. 
Also, for an infinite dimensional data-generating model, AIC and AICc are asymptotically efficient [20, [HI and, based 
on the results of Ref. [l!J], it can be surmised that in small samples, AICc is more efficient than AIC and BIC for 
most choices of the data-generating model. 

APPENDIX A: BIAS OF AIC 

In this Appendix, we demonstrate that Baic — — ^^^/3(So) + o{N^^), where /3(Eo) = 0(1) is given by Eq. ([7]). 
First, we calculate 7 in the expansion 

AIC(S) - A(B,E) = -7 + Op(iV-i). (Al) 

Taking the expectation under the data-generating model of both sides of Eq. (|Aip yields Baic = -~Eq{'j) + o{N^^). 
Second, we calculate Eo{j) and find /3(Eo) from Eoi^) = A^-^/3(Eo) + o{N-^). 
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1. The first term of the expansion of Eq. (|AT]) 



We consider the expansion 



lim |aIC(E„+i)-A(B„,S„+i)| = -7) + 0p(iV-i), 



(A2) 



where the estimators and B„ of, respectively, S and B at the n-th step of the constrained maximization (CM) 

algorithm, are given by Eq. (jlSp . Depending on the initial estimator Si, Drton and Richardson [l7j demonstrated 
that the CM algorithm may end up in a local maximum or a saddle point of C{B,Y,), rather than in the global 
maximum C{B, S). It turns out, however, that f/ does not depend on Ei, which implies "f ~ f/. 

Because the candidate model is either correctly specified or overspecified, the left-hand side of Eq. (jA2p can be 
written as 



lim |aIC(S„+i)-A(B„,S„+i)| =2i^+p(p+l) 



- iV hm < 

71— i-CX) 



Tr 



(A3) 



V Op(Ar-i/2) 

where e = vec(f ), 'vec' is the column- wise vectorization operator, 

±n+i^N-'TTsitNp-Pn)ee^{tNp-Pnf and P„ ^ X{X^{±-^ ^ tN)X}-^X^{±;;' ^ In). (A4) 

(By writing it as -/VTrE„_,_]^E^:[:j^, the part Np of AIC is absorbed in the second line of Eq. (|A3p .) The order symbols 
below the horizontal curly braces in Eq. (jASp refer to the elements of the corresponding matrices. From now on, 
when an order symbol refers to a matrix, all of its elements are of the indicated order. (The Np x Np matrix P„ is 
Op(iV-i) because N-^Z^Z = 0(1).) 
The matrix E~^ has expansion 



(A5) 



3=0 



where Q is a non-negative integer. The expansion of Eq. (|A5p holds because p ~ 0{1). Similarly, because K — 0(1), 
the matrix {A'"'"(E~^ ® ljv)A'}~^ has expansion 



{AT(E-i®1jv)X}-i 



Q' 



{XT(Eo 1 tN)X}-^J2(-^y [X^iiKr^ - ^0^) ^ liv}^{^^(Eo 1 ® tN)Xy 



OpiN 



-Q'/2-l 



), (A6) 



where Q' is a non-negative integer. By combining Eq. (jA5[) with Q = 2, Eq. (jA6[) with Q' — 2 and lim„^oo E„ 
iV"iTrs(lArp - Fo)ee'^(lAfp - -Po)'^ + Op(iV"i), where Pq = 0(A^"i) is given by Eq. ([8]), we obtain 



lim Pn^Po+ P-3/2 + P-2 + Op{N-^), 
n — 'oo 

where the matrices -FL3/2 = Op{N~^^'^) and P-2 = Op{N^^) are given by 

P-3/2 = Po{{N-^TYsee'^ - Eq) IjvK-Pd" - ^Np){^o^ ^ Ijv) 

and 



(A7) 



(A8) 



- P-3/2{(A^"'TrseeT - Eo) ® Ia^KE^^ ® lAr)(lArp - Pq)- (A9) 
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By combining Eq. (jASp with Q = 3 and lini„^oo Pn = Pa + P-3/2 + Op{N •^/^), we obtain 

3 ^ 
Jim E-jii = So 1 ^(-1)^' ([iV-iTrs{ljvp - (^o + P-3/2)W{tNp - (Po + - So] So"') + Op{N-^^^). 

(AlO) 

Substituting the expansions of Eqs. (|A7IA10[) in the right-hand side of Eq. (|A3[) . expressing it as the right-hand 
side of Eq. (|A2p . and noting that 7 = ^ (because fj does not depend on Ei), yields 

7 = ArTr(So - iV-iTrseeT)£-i + + + 7-i, (AH) 

where 70 = Op(l), 7-1/2 = Op(A^"^/^) and 7_i = Op{N-^) are given by 

70 - -2K-pip+l) + TviTrsee^P;^ + TTsPoee^)^o^+NTT{{j:o-N~'TTsee'^)^o'V, (A12) 



and 



7_i/2 = 2Tr(So - iV-iTrs6eT)5]-i(Trs«TpT + TrsPo«^)So-i iVTr{(Eo - Ar-iTrseeT)I]-i}3 
- Tr(S]o - iV-iTrseeT)Eo-i(TrsPo«^^o')So~' + TriTrsee^ P^^^^ + TrsP-3/2ee^)So ' 



7_i = iV-iTr(TrseeTpT + TrsFoe6T)E-i(TrseeTpT + TrgPo"^ - TisPoee'^ P^)i:o' 

+ 2Tr(TrseeTpT^/2 + TrsF_3/2«^)S^'(So - A^-iTrseeT)^^ 1 

+ Tr(TrseeTpT2 + TrsP-2eeT)So-' + iVTr{(Eo - N-^Trsee^)^^'}^ 

- 'n(TrsP_3/2«^Po^ + TrsPoeeTpT^/2)So '(So - A^-^TrgeeT)!]-! 
+ 3Tr(TrseeTpT + TrsPoee'^)!:^' {i^o - N-^TTsei^)^:,'}^ 

- 2Tr(TrsPo«^^o')So H(So - N-'TTsee^)j:^'}\ 



(A13) 



(A14) 



2. Expectation under the data-generating model 

The elements of the A'^p-dimensional Gaussian columnvector e — vec(£) have vanishing mean and two-point average 

^in^jm) — (So)ij(5„„i, (A15) 

where Cifi is multi-inciGx notation for ^N{i—i)--\-n — ^in B-nd Sum is ^ Kronccker dcltci. Beca-USG £/o {-^Tr(Xjo ^ 
N'^TYsee^)T.'^^} = 0, -Bo(7) takes the form 

£^0(7) ^ Eoilo) + £^0(7-1/2) + £^0(7-1)- (A16) 

Applying Wick's theorem, which states that the average of a product of 2g elements of e, where g is a positive integer, 
equals the sum of products of all 11^=1 (2* ~ 1) possible pairings of two-point averages, we obtain 

Eoilo) = 0, (A17) 
^^0(7-1/2) - N-'[-6K{p + 1) + STrPoPo^' + 3Tr(TrRFo)(TrRPoT) _ {^2 ^ + p{p + if}] (AI8) 

and 

i^o(7-i) = A^"'{12if(p+l) + 2Tr(TrsPo)2-6TrPoPoT^-6Tr(TrRPo)(TrRPoT)V + 3p+2p(p+l)2} + o(A^^i). (A19) 
Substituting Eqs. (|A17IA18IA19|) in Eq. (|IT6)) . yields 

Eoi^)=N~'p{^o)+o{N-'), (A20) 

where /3(S]o) = 0(1) is given by Eq. 
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APPENDIX B: PROOF OF EQ. (fTTj) 



In this Appendix, we demonstrate 



3K{p + 1) + 2K^p-^ < min 



{2Tr(TrsPo)' - ^^rP^Pj'' ~ 3TT{TTnPo){TTnP^)} 



(Bl) 



where the minimization is over all p x p symmetric positive definite matrices and the equality sign is attained if and 
only if Ji = Jj for all i and j. By adding 6-ft'(p+ 1) 1)^ on both sides of Eq. (IBip . we obtain /3* < mino /3(^1) 

of Eq. (HI]). 

Using TrgPo = Sy^(Trs^)S(7^^^ where 



A = (Eg (g) ljv)Po(Sy^ Ijv) = (So «> lAr)X{XT(So'^ 1n)X}-^X^{Y.^^^^ ® l^r), 
we find that Tr(TrsPo)^ can be written as 

Tr(TrsPo)' = Tr(Trs^)2. 

From 

mm (TrC^|TrC = K) = K^p^\ 
where the minimization is over all p x p symmetric matrices C, we obtain 



mm 
n 



{Tr(TrsPo)'} 



mm 

n 



{Tr(Trs^)2} 



(B2) 
(B3) 
(B4) 

(B5) 



The minimum of Eq. (jB4p is attained if and only if Cu = Kp^^ and dj = for all i ^ j. This corresponds to 
Trs^ — Kp^^lp, which can be reached if = Jj for all i and j or if Eq = Ip and |j7i| = |j7j | for all i and j. 
Because 



TrP(,Po^" equals the inner product of A"^^ and A: 

TrP^Po^" = Tr^'^M^. 



(B6) 



(B7) 



The squared length Tr^^""" of A equals K {A is an orthogonal projection matrix of rank K). The squared length of 
A"^^ equals that of A and we have 



max|(TrPoPoT^)^ = max { (Tr^Ts_4T)^^^^^^| < {TtAA^TtA^- {A^^f}'^' ^ K. 



(B8) 



The upper bound of K in Eq. (|B8p is attained if and only if ^ = A^^ , which can be reached if Ji = Jj for all i and 
j or if So = Ip. 

Using TrRPo = TrR^, we find 



Tr(TrRPo)(TrRP„T) = Tr(TrR^)(TrR^)T = Tra,,a 

i=i j=i 



(B9) 



where aij is the ij-th N x N submatrix of A. The sum of the squared lengths of the au 's is bounded by 



^ Tra,,aT <J2T. T^^jfly = Tr^^^ = K 

i—1 i—1 j—1 



(BIO) 



The upper bound pK of 



max >^ >^ Trc, 



it n 



i=\ j=i 



^Trc^.c^ <if <pK, 



(Bll) 
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where the maximization is over p symmetric N x N matrices Ca, is attained if and only if cu = cjj and '^'^'^u'^Ji ~ 
K. Translated to A, this means that = for all i ^ j and the a^^'s are identical orthogonal projection matrices of 
rank Kp"^ . This can be reached if and only if Ji — Jj for all i and j. It follows that 



max 



{Tr(TrRPo)(TrRFo^)}j.^^, 



<pK, 



(B12) 



where the equality sign is attained if and only if J^i = Jj for all i and j. 
By combining the bounds of Eqs. (|B5IB8IB12|) . we obtain Eq. (|BT|) . 



APPENDIX C: DETAILS ABOUT THE SIMULATION STUDY 



In this Appendix, we give the algorithm used to calculate the maximum likelihood estimators. Also, additional 
simulation results are presented and 5 is demonstrated to be sufficiently small. 



1. Calculating the maximum likelihood estimators 



The CM algorithm is run with Ei = Ip and, after convergence is achieved (|DetS„+i — DetE„| < (5DetS„), we set 
Stcmp = ^n+i and -Btcmp = Bn- Then, another Ei is constructed by drawing a, p x p matrix from Wp{1Lp,p), where 
W denotes a Wishart distribution, and dividing it by p. With the randomly created Si, the CM algorithm is run 
up to convergence (possibly with another number of iterations than in the previous run) and if the newly calculated 

Snew = and Snow = Bn Satisfy 

'C(i?now, Snow) > 'C(i3tcmp, Stomp) and jDctSnow — DctEtompI > 10(5DetEtomp, (CI) 

we set Stomp = Show and Btcmp — Bacw The above is repeated until Stomp and i?tomp remain unchanged for 10 
different randomly created Si's in a row. When the algorithm is terminated, £(-Btomp, Stomp) is considered to be the 
global maximum of C{B, S) and we set B — i?tomp and S = Stomp- Table HIl holds the number of jumps of Stomp (and 
^^tomp) in 1000 samples of size = 15 for p = 0.5. (These are the same samples as the ones of Sec. IIIIl ') It turns out 
that multi-modality is indeed rare: The candidate model with i = j = 5 has a multi-modal C{B, S) in at most 1.6% 
of the 1000 samples. Table HIl also holds the number of additional Si's in the 1000 samples. There are about 2 to 3 
additional Si's per jump and a maximum of 5 additional Si's per jump. 



TABLE II: Number of jumps of Etcmp and number of additional Ei's in 1000 samples of size N — 15 for p — 0.5. 

jumps additional Ei's 



i j 





1 


2 


3 


4 


5 


1 


2 


3 


4 


5 


1 
































2 











1 


1 











5 


2 


3 








1 


2 


1 








5 


2 


5 


4 





2 


1 


2 


7 





4 


1 


4 


18 


5 





2 


5 


5 


16 





3 


10 


10 


38 



2. Additional simulation results 



The frequencies of selecting the correct model with AIC, AICc and BIC in 1000 samples of sizes TV = 15, 20, 50 are 
given in Table HlTl for p = 0.2, 0.5, 0.8. In the simulation, the values of the covariates in the samples of sizes = 15 and 
N = 20 are the same as, respectively, the first 15 and 20 values of the covariates in the samples of size N — 50. Table 
mil also holds the number of times that the difference between the second smallest and smallest value of a criterion 
is less than 200NS (10 times the numerical error of the difference). These numbers are of order unity such that S is 
sufficiently small. The average (over 1000 samples) of the difference between the second smallest and smallest value 
of a criterion is not given in Table [1111 but it ranges from 3.8 (for AIC with A = 15 and p = 0.2) to 52.4 (for BIC 
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with = 50 and p = 0.8). In all 1000 samples of sizes iV = 20 and N = 50, there are no jumps of Etomp- In the 
samples of size N = 15, the number of jumps of Stemp and the number of additional Si's do not depend much on p. 

TABLE III: Frequencies / and v in 1000 samples of, respectively, selecting the correct model and the difference between the 
second smallest and smallest value of a criterion being smaller than 200N5. 

N p ] ^ 



AIC AICc BIG AIC AICc BIG 



15 0.2 


249 


500 


397 





1 





0.5 


241 


488 


385 











0.8 


233 


473 


365 


1 








20 0.2 


366 


570 


577 











0.5 


353 


604 


611 





2 





0.8 


324 


553 


556 











50 0.2 


493 


590 


835 





1 





0.5 


528 


616 


832 





1 





0.8 


503 


594 


848 
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